Skip to content

chore: move rivetkit to task model#4680

Draft
NathanFlurry wants to merge 1 commit into04-16-chore_rivetkit_to_rustfrom
04-19-chore_move_rivetkit_to_task_model
Draft

chore: move rivetkit to task model#4680
NathanFlurry wants to merge 1 commit into04-16-chore_rivetkit_to_rustfrom
04-19-chore_move_rivetkit_to_task_model

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 20, 2026

🚅 Deployed to the rivet-pr-4680 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web Apr 24, 2026 at 12:32 pm
website 😴 Sleeping (View Logs) Web Apr 22, 2026 at 8:58 am
frontend-cloud ❌ Build Failed (View Logs) Web Apr 22, 2026 at 8:46 am
frontend-inspector ❌ Build Failed (View Logs) Web Apr 22, 2026 at 8:46 am
mcp-hub ✅ Success (View Logs) Web Apr 22, 2026 at 8:45 am
ladle ❌ Build Failed (View Logs) Web Apr 20, 2026 at 12:09 am

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 20, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry mentioned this pull request Apr 20, 2026
11 tasks
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Code Review: PR #4680 - chore: move rivetkit to task model


Overview

This is a substantial refactor (~33k lines changed across 218 files) that replaces the old implicit event-loop/lifecycle model in rivetkit-core with an explicit task-based model. The core change is a new ActorTask struct in task.rs (1,646 lines) with a tokio::select! event loop, replacing the deleted lifecycle.rs. Key supporting additions: task_types.rs, work_registry.rs, and the new async_counter.rs utility primitive.


Positive Aspects

  • Clean state machine: ShutdownPhase enum and install_shutdown_step drive a well-structured, sequential shutdown with RAII and panic catching via AssertUnwindSafe.
  • Event-driven drains: The old 10ms poll loops in sleep.rs are replaced with AsyncCounter::wait_zero(deadline) and Notify, significantly more efficient and correct.
  • WorkRegistry/RegionGuard: Clean RAII-based work tracking with atomic counters. The panic-unwind test is a nice safety net.
  • async_counter.rs: Well-designed lock-free primitive with correct Relaxed increment / AcqRel decrement / Acquire load ordering. The enable-before-check pattern for notified() correctly avoids the TOCTOU race.
  • Buffered actor messages: New buffered_actor_messages in EnvoyContext fixes the race where WebSocket messages arrive before actor registration.
  • Hibernation restore as shared map: Moving HWS restore from ToActor::HwsRestore channel message to pending_hibernation_restores removes an ordering dependency between the envoy loop and actor init.
  • Test coverage: 2,932-line tests/modules/task.rs with good breadth including shutdown hooks, lifecycle events, and time-controlled tests.

Issues

Issue 1 - Convention Violation: Mutex<HashMap> in envoy-client/src/context.rs

pub actors: Arc<StdMutex<HashMap<String, HashMap<u32, SharedActorEntry>>>>,
pub live_tunnel_requests: Arc<StdMutex<HashMap<[u8; 8], String>>>,
pub pending_hibernation_restores: Arc<StdMutex<HashMap<String, Vec<HibernatingWebSocketMetadata>>>>,

CLAUDE.md explicitly prohibits this: "Never use Mutex<HashMap<...>> or RwLock<HashMap<...>>. Use scc::HashMap (preferred)." These three fields should be scc::HashMap.

Issue 2 - Tracing Style Violations in task.rs

Two tracing::warn! calls embed reason_label directly in the message string, violating the structured logging convention:

tracing::warn!("{reason_label} shutdown timed out waiting for shutdown tasks");
tracing::warn!("{reason_label} shutdown timed out after disconnect callbacks");

Should use structured fields:

tracing::warn!(reason = reason_label, "shutdown timed out waiting for shutdown tasks");

All other warn/error calls in the same function correctly use reason = reason_label as a structured field.

Issue 3 - Double can_sleep().await in on_sleep_tick

on_sleep_tick checks can_sleep(), and if it returns No, calls reset_sleep_deadline() which calls can_sleep() again. The state could change between the two awaits. Consider capturing the result of the first can_sleep() call and threading it into reset_sleep_deadline.

Issue 4 - #[allow(dead_code)] on WorkRegistry and impl WorkRegistry

Both the struct and its impl block carry #[allow(dead_code)]. If fields are truly unused they should be removed; if they are used transitively through SleepController, the attribute should be dropped.

Issue 5 - Unnecessary Re-acquire After teardown() in SleepController

Once teardown_started is set, track_shutdown_task rejects all new tasks. The final re-lock to write the drained JoinSet back is unnecessary; it can simply be dropped after shutdown().await.


Minor Observations

  • CountGuard type alias: pub(crate) type CountGuard = RegionGuard; - these represent different concepts (active regions vs task counts). A brief doc comment would help prevent future confusion.
  • Test hooks use global OnceLock state: SHUTDOWN_CLEANUP_HOOK, LIFECYCLE_EVENT_HOOK, SHUTDOWN_REPLY_HOOK are process-global statics. Tests using these hooks need to run sequentially. Document this constraint.
  • Draft status / checklist: All boxes are unchecked. website/src/content/docs/actors/lifecycle.mdx was touched - verify it is consistent with the new task model before marking ready.

Summary

The architectural direction is sound. The explicit task model with event-driven drains is a meaningful improvement over the previous polling-based lifecycle. Items to address before merge:

  1. Replace StdMutex<HashMap> with scc::HashMap in envoy-client/src/context.rs (three fields)
  2. Fix the two tracing format string style violations in task.rs
  3. Remove the unnecessary re-lock after teardown() in SleepController::teardown

@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 5ca4ef7 to 710f2df Compare April 22, 2026 08:44
@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 710f2df to 5b8e173 Compare April 22, 2026 09:04
@NathanFlurry NathanFlurry force-pushed the 04-16-chore_rivetkit_to_rust branch from 28a55d1 to f4ad70b Compare April 24, 2026 07:21
@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 5b8e173 to 02b0b04 Compare April 24, 2026 07:21
@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 02b0b04 to b2af041 Compare April 24, 2026 08:12
This was referenced Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant